Introduction to Data Analytics

Welcome to ANLY - 500

What is Analytics? - Possible Definition 1

What is Analytics? - Possible Definition 2

Scope of Analytics?

What is Descriptive Analytics? (1)

An Example of what to Expect in Descriptive Analytics: Ex.1.1

library(datasets) 
data("sunspot.month") # special way to load embedded data
head(sunspot.month)
#> [1] 58.0 62.6 70.0 55.7 85.0 83.5

An Example of what to Expect in Descriptive Analytics: Ex.1.1

str(sunspot.month)
#>  Time-Series [1:3177] from 1749 to 2014: 58 62.6 70 55.7 85 83.5 94.8 66.3 75.9 75.5 ...

An Example of what to Expect in Descriptive Analytics: Ex.1.1

summary(sunspot.month)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00   15.70   42.00   51.96   76.40  253.80

An Example of what to Expect in Descriptive Analytics: Ex.1.1

library(ggplot2)
sunspot.month <- as.data.frame(sunspot.month)
sunspot.month$Time <- 1:nrow(sunspot.month)
ggplot(sunspot.month, aes(x = Time, y = x)) + 
  geom_point(alpha = 0.5) + 
  ylab("Number of Sunspots") + 
  xlab("Time") +
  theme_classic()

What is Predictive Analytics?

What is Predictive Analytics?

An Example of what to Expect in Predictive Analytics: Ex.2.1

library(quantmod)
start <- as.Date(Sys.Date()-(365*5))
end <- as.Date(Sys.Date()-2)
getSymbols("AMZN", src = "yahoo", from = start, to = end)
#> [1] "AMZN"
str(AMZN)
#> An 'xts' object on 2016-01-13/2021-01-08 containing:
#>   Data: num [1:1257, 1:6] 621 580 572 577 564 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:6] "AMZN.Open" "AMZN.High" "AMZN.Low" "AMZN.Close" ...
#>   Indexed by objects of class: [Date] TZ: UTC
#>   xts Attributes:  
#> List of 2
#>  $ src    : chr "yahoo"
#>  $ updated: POSIXct[1:1], format: "2021-01-11 20:58:04"

An Example of what to Expect in Predictive Analytics: Ex.2.1

predictive_model <- lm(formula = AMZN.Close ~ AMZN.High + AMZN.Low + AMZN.Volume, 
                       data = AMZN[1:1199,])
summary(predictive_model)
#> 
#> Call:
#> lm(formula = AMZN.Close ~ AMZN.High + AMZN.Low + AMZN.Volume, 
#>     data = AMZN[1:1199, ])
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -99.970  -5.430  -0.183   5.698 100.232 
#> 
#> Coefficients:
#>                  Estimate    Std. Error t value
#> (Intercept) 0.03934667083 1.64537794218   0.024
#> AMZN.High   0.48016448701 0.02506871895  19.154
#> AMZN.Low    0.52074716067 0.02576580140  20.211
#> AMZN.Volume 0.00000003449 0.00000027466   0.126
#>                        Pr(>|t|)    
#> (Intercept)               0.981    
#> AMZN.High   <0.0000000000000002 ***
#> AMZN.Low    <0.0000000000000002 ***
#> AMZN.Volume               0.900    
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 15.08 on 1195 degrees of freedom
#> Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
#> F-statistic: 8.047e+05 on 3 and 1195 DF,  p-value: < 0.00000000000000022

An Example of what to Expect in Predictive Analytics: Ex.2.1

par(mfrow=c(2,3))
plot(predictive_model,1)
plot(predictive_model,2)
plot(predictive_model,3)
plot(predictive_model,4)
plot(predictive_model,5)

An Example of what to Expect Analytics: Ex.2.1

n <- length(AMZN[,1])
prediction <- stats::predict(predictive_model, AMZN[1200:n,])
tail(data.frame(prediction))
#>            prediction
#> 2020-12-31   3264.328
#> 2021-01-04   3208.529
#> 2021-01-05   3196.080
#> 2021-01-06   3166.064
#> 2021-01-07   3183.745
#> 2021-01-08   3168.485

An Example of what to Expect Analytics: Ex.2.1

plot(prediction, type = "l")

What is Prescriptive Analytics?

What does this translate into?

What is Data Analytics?

A Subcomponent of Data Analytics is Data Analysis!

A Subcomponent of Data Analytics is Data Analysis!

Other Types of Analysis

How to Correctly Apply Data Analytics?

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - Generating Theories

Breaking Down the Research Process - Creating a Hypothesis

Breaking Down the Research Process - Testing Theories & Hypotheses

Breaking Down the Research Process - Identifying the Variables

What’s After the Question & Identifying Variables?

What is Data?

Types of Measurements

Categorical Variables

Categorical Levels of Measurement - Binary

Categorical Levels of Measurement - Nominal

Categorical Levels of Measurement - Ordinal

Continuous Variables

Continuous Levels of Measurement - Interval

Continuous Levels of Measurement - Ratio

Consider Measurement Error:

How Valid Are My Measures?

Are My Measures Reliable?

Breaking Down the Research Process - Collecting the Data

Cross-Sectional Research

Longitudinal Research

Correlational Research

Experimental Research

Experimental Research - Methods

Experimental Research - Methods

Experimental Research - Methods

Breaking Down the Research Process - Methods to Collect the Data

Types of Variation in the Data to Consider:

Breaking Down the Research Process - Analyzing the Data

Population vs Sample

Fitting Models

Fitting Models

tapply(iris$Sepal.Length, iris$Species, mean)
#>     setosa versicolor  virginica 
#>      5.006      5.936      6.588

Statistical Modeling Parameters

Statistical Modeling Parameters

sample <- iris[sample(nrow(iris), 15), ]
tapply(sample$Sepal.Length, sample$Species, mean) #sample
#>     setosa versicolor  virginica 
#>   4.700000   5.966667   6.816667
tapply(iris$Sepal.Length, iris$Species, mean) #population
#>     setosa versicolor  virginica 
#>      5.006      5.936      6.588

Applicable Statistical Models

Summary